Final Report¶

Art Classifier

In [ ]:
import warnings
warnings.filterwarnings("ignore")

Environment¶

The project is expected to run in a virtual environment in python 3.10.8. Please install the necessary packages from requirements.txt. Only Windows with GPU is supported at this time.

Files¶

The code for running the project can be found on GitHub. The datasets can be found on Google Drive: wikiART224 and wikiART9.

The autoencoder also needs this which was too big for GitHub. The following files semART and autoencoder clusters need to be downloaded and extracted to the main directory.

Description¶

Welcome to our AI Art Curation System, a revolutionary tool designed to enhance the experience of art gallery and museum visitors. Our system employs advanced AI algorithms to identify the genre and style of artworks, providing visitors with an enriching and educational experience. By leveraging the power of AI, we aim to make art more accessible and enjoyable for everyone, from casual artgoers to seasoned enthusiasts.

In our end-to-end pipeline, we begin with a common input: images of fine arts. This data undergoes preprocessing before being fed into our first model, the U-Net encoder. This model extracts embeddings, which we utilize for KMeans clustering to derive cluster labels. Using these labels, we generate artwork recommendations based on embedding similarity. Moving forward, our second model employs ResNet50 for style classification, while our third model, employing ResNet101, focuses on genre classification. Each model serves a distinct purpose, collectively providing comprehensive insights into fine art characteristics.

image.png

Data Collection¶

In our project proposal we decided to use the WikiART General Dataset. However, the dataset had issues with metadata file and with missing/duplicate images. Additionally, it didn't have all the classes we wanted for a good representation of notable art styles. We decided to make our own dataset called wikiART224 and wikiART9, which we cited as an alternative in our proposal (WikiArt Dataset).

wikiART224 is created by adding zero-padding to the smallest dimension of the image until we reach 1:1 aspect ratio. The image is then resized to 224x224 for ResNET. wikiART9 is created by adding zero-padding to the smallest dimension of the image until we reach 1:1 aspect ratio. The image is then segmented into a 3x3 grid and then each subimage is resized to 224x224 for ResNET. The resulting metadata can be found in labels.csv. The target classes are art style and genre.

The datasets were created using the following python files: data_clean.ipynb and data_process.ipynb. The first file was used to clean/correct the dataset's metadata by adding in artist, style, and genre information which was partially missing from the source dataset's metadata as only some labels were in the classes.php file. The second file was used to transform the source data into our datasets. Both the datasets and metadata is hosted on our project's Google Drive. The python files are hosted on the project github, it is not suggested to run them as it will take a long time due to the size of the datasets (32 GB).

Furthermore, since we wanted a dataset with descriptions to use for the autoencoder we used the semART dataset for its training.

Summary of Data¶

Please note that wikiART9 has 9x the images of wikiART224 but the exact same distributions as they are subimages. The data will not be plotted twice for brevity.

In [ ]:
import pandas as pd
file_path = 'Dataset\\labels.csv'
df = pd.read_csv(file_path)
In [ ]:
print(f"Number of samples - wikiART224: {df.shape[0]}")
print(f"Number of samples - wikiART9: {9*df.shape[0]}")
Number of samples - wikiART224: 81444
Number of samples - wikiART9: 732996

We have 81444 and 732996 samples for wikiART224 and wikiART9, respectively.

In [ ]:
import plotly.io as pio
pio.renderers.default = 'notebook'
import plotly.express as px
import plotly.subplots as sp
import plotly.graph_objs as go
from plotly.offline import init_notebook_mode
init_notebook_mode(connected=False)
In [ ]:
# Show distribtion (barplot) art style
fig_art_style = px.bar(df['art_style'].value_counts(), x=df['art_style'].value_counts().index, y=df['art_style'].value_counts().values, labels={'y': 'Count', 'x': 'Art Style'},
                       title='Distribution of Data by Art Style')
fig_art_style.update_layout(width=1000, height=500)
fig_art_style.show()
In [ ]:
# Calculate the proportion of NaN values in the genre
nan_proportion_genre = df['genre'].isna().sum() / len(df)
# Create a pie chart for the proportion of NaN values in genre
fig_nan_genre = px.pie(names=['Missing Values', 'Present Values'],
                      values=[nan_proportion_genre, 1 - nan_proportion_genre],
                      title='Proportion of Missing Values in Genre Column')
fig_nan_genre.update_layout(width=600, height=400)
fig_nan_genre.show()

The data on wikiART does not have labels for genre for all images. Around 20% are unlabelled or belong to an other class with low representation.

In [ ]:
# Show distribtion (barplot) genre
fig_genre = px.bar(df['genre'].value_counts(), x=df['genre'].value_counts().index, y=df['genre'].value_counts().values, labels={'y': 'Count', 'x': 'Genre'},
                   title='Distribution of Data by Genre')
fig_genre.update_layout(width=800, height=500)
fig_genre.show()

Results Achieved¶

1. Style Classification Model¶

Please note that not all the codes are given below. Only the essential codes are shown for visibility.

In [ ]:
# needed libraries
import os
import zipfile

import random
from time import time
from tqdm import tqdm
import pandas as pd
import numpy as np

from matplotlib import pyplot as plt
import seaborn as sns
import joblib

from sklearn.metrics import confusion_matrix

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset
from torchvision import datasets, transforms, models
from torch.optim import Adam
from torch.optim.lr_scheduler import StepLR
from torch.utils.data.sampler import SubsetRandomSampler
from torch.utils.data import Subset

from PIL import Image
import csv

from sklearn.metrics import precision_score
from sklearn.metrics import precision_recall_fscore_support

from collections import Counter

1.1 Data Loading¶

WikiART224 images are used to solve the style classification problem. Transformation of normalization is applied to each images and loaded to train, validation, test loader by custom_dataloader function. The ratio of each dataset is 8:1:1.

In the beginning, the model performance wasn't improving enough when all 27 classes of the style images were used for fine tuning. Therefore, it was decided to drop classes with small number of images and use only 13 styles as the subset data. The fact that there is a data imbalance and differentiating between some style classes is difficult also support this decision. The selected classes are as follows.

In [ ]:
used_class = ['Abstract_Expressionism',
              'Art_Nouveau_Modern',
              'Baroque',
              'Cubism',
              'Expressionism',
              'Impressionism',
              'Naive_Art_Primitivism',
              'Northern_Renaissance',
              'Post_Impressionism',
              'Realism',
              'Rococo',
              'Romanticism',
              'Symbolism']
In [ ]:
# image transformation
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
In [ ]:
def custom_dataloader(data_dir, transform, batch_size=64, num_workers=1):
  raw_dataset = datasets.ImageFolder(root=data_dir, transform=transform)
  print(raw_dataset)

  num_classes = len(raw_dataset.classes)
  print("Number of classes:", num_classes)

  np.random.seed(1000) 
  indices = np.arange(len(raw_dataset))
  np.random.shuffle(indices)
  train_split = int(len(indices) * 0.8)
  testval_split = train_split + int(len(indices) * (1 - 0.8)/2)

  # split into training and validation indices
  relevant_train_indices, relevant_val_indices,test_indices = indices[:train_split], indices[train_split:testval_split] ,indices[testval_split:]
  train_sampler = SubsetRandomSampler(relevant_train_indices)
  train_loader = torch.utils.data.DataLoader(raw_dataset, batch_size=batch_size,
                                             num_workers=num_workers, sampler=train_sampler)
  val_sampler = SubsetRandomSampler(relevant_val_indices)
  val_loader = torch.utils.data.DataLoader(raw_dataset, batch_size=batch_size,
                                            num_workers=num_workers, sampler=val_sampler)
  test_sampler = SubsetRandomSampler(test_indices)
  test_loader = torch.utils.data.DataLoader(raw_dataset, batch_size=batch_size,
                                           num_workers=num_workers, sampler=test_sampler)
  print(f'data loading completed')
  return train_loader, val_loader, test_loader
In [ ]:
### MODIFY according to your directory###
'''
rawdata_dir: file path of the extracted data. e.g. /file/path/of/data/folder
save_dir: file path for the saved data. e.g. /file/path/to/save/results/
'''
rawdata_dir = '/content/extracted_data_subset13'
save_dir = '/content/drive/MyDrive/UofT/MIE1517/project/subset fine tune/'
In [ ]:
train_loader, val_loader, test_loader = custom_dataloader(rawdata_dir, transform)
Dataset ImageFolder
    Number of datapoints: 69125
    Root location: /content/extracted_data_subset13/wikiART224
    StandardTransform
Transform: Compose(
               Resize(size=(224, 224), interpolation=bilinear, max_size=None, antialias=True)
               ToTensor()
               Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))
           )
Number of classes: 13
data loading completed

Samples of the images used are visualized as below. The images look darker than the original ones because it was normalized.

image.png

1.2 Model Architecture¶

ResNet50 was implemented as the base model. ResNet50 has 50 layers and residual learning block, expected to exhibit good performance. After the transfer learning, the last fully connected layers are added to achieve the style classification task.

image.png

In [ ]:
class StyleResNet50_5(nn.Module):
  def __init__(self, hidden_dim1=1024, hidden_dim2=256):
    super(StyleResNet50_5, self).__init__()

    resnet = models.resnet50(pretrained=True)
    self.resnet_features = nn.Sequential(*list(resnet.children())[:-1])

    self.fc1 = nn.Linear(2048, hidden_dim1)
    self.fc2 = nn.Linear(hidden_dim1, hidden_dim2)
    self.fc3 = nn.Linear(hidden_dim2, 13)
    self.dropout = nn.Dropout(p=0.5)
    self.flatten = nn.Flatten()
    self.leaky_relu = nn.LeakyReLU(negative_slope=0.1, inplace=True)


  def forward(self, x):
    x = self.resnet_features(x)
    x = self.flatten(x)
    x = self.dropout(x)
    x = self.leaky_relu(self.fc1(x))
    x = self.dropout(x)
    x = self.leaky_relu(self.fc2(x))
    x = self.dropout(x)
    x = self.fc3(x)

    return x

1.3 Training¶

Training code is as follows. Every epoch took about 10 minutes in colab environment.

In [ ]:
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"device: {device}")
In [ ]:
def train_model(model, model_name, train_loader, valid_loader, device, save_dir, num_epochs=4, lr=0.0001, wd=0, clip_gradient=None):
    train_losses = []
    valid_losses = []
    train_accuracies = []
    valid_accuracies = []
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)
    model.train()

    optimizer = Adam(model.parameters(), lr=lr, weight_decay=wd)
    criterion = nn.CrossEntropyLoss()

    start_time = time.time()
    for epoch in range(num_epochs):

        running_loss = 0.0
        running_pred = 0.0
        running_total = 0.0
        tqdm_bar = tqdm(train_loader)

        for batch_idx, (inputs, labels) in enumerate(tqdm_bar):
            with torch.set_grad_enabled(True):
                inputs, labels = inputs.to(device), labels.to(device)
                optimizer.zero_grad()
                outputs = model(inputs)
                loss = criterion(outputs, labels)

                loss.backward()
                if clip_gradient is not None:
                    torch.nn.utils.clip_grad_norm_(model.parameters(), clip_gradient)
                optimizer.step()

                running_loss += loss.item() * inputs.size(0)
                pred = outputs.max(1, keepdim=True)[1]
                running_pred += pred.eq(labels.view_as(pred)).sum().item()
                running_total += pred.shape[0]


        train_acc = running_pred / running_total
        train_loss = running_loss / running_total
        train_losses.append(train_loss)
        train_accuracies.append(train_acc)
        running_loss = 0.0
        running_pred = 0.0
        running_total = 0.0
        valid_loss, valid_acc = evaluate_model(model, valid_loader, device)
        valid_losses.append(valid_loss)
        valid_accuracies.append(valid_acc)
        print(f'Epoch [{epoch+1}/{num_epochs}], Step [{batch_idx+1}/{len(train_loader)}], '
              f'Training Loss: {train_loss:.4f}, Training Accuracy: {train_acc:.4f}, '
              f'Validation Loss: {valid_loss:.4f}, Validation Accuracy: {valid_acc:.4f}')
        model.train()

        joblib.dump(model, f'{save_dir}{model_name}_epoch{epoch+1}.joblib')

        results = pd.DataFrame({'train loss': train_losses, 'valid loss': valid_losses, 'train accuracy': train_accuracies, 'valid accuracy': valid_accuracies})
        results.to_csv(f'{save_dir}{model_name}_results.csv')
    end_time = time.time()
    elapsed_time = end_time - start_time
    plot_curves(train_losses, valid_losses, train_accuracies, valid_accuracies, elapsed_time)


    return model, train_losses, valid_losses, train_accuracies, valid_accuracies

Additional utility function are as given below

In [ ]:
def plot_curves(train_losses, valid_losses, train_accuracies, valid_accuracies, elapsed_time):
    print(f'elapsed time: {elapsed_time}')
    iterations = range(1, len(train_losses) + 1)
    # Plot Loss Curve
    plt.figure(figsize=(10, 5))
    plt.plot(iterations, train_losses, label='Train Loss', color='blue')
    plt.plot(iterations, valid_losses, label='Validation Loss', color='orange')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.title('Training and Validation Loss')
    plt.legend()
    plt.grid(True)
    plt.show()
    epochs = range(1, len(train_accuracies) + 1)
    # Plot Accuracy Curve
    plt.figure(figsize=(10, 5))
    plt.plot(epochs, train_accuracies, label='Train Accuracy', color='blue')
    plt.plot(epochs, valid_accuracies, label='Validation Accuracy', color='orange')
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.title('Training and Validation Accuracy')
    plt.legend()
    plt.grid(True)
    plt.show()
In [ ]:
def evaluate_model(model, data_loader, device):
    model.eval()  # Set the model to evaluation mode
    total_loss = 0.0
    total_correct = 0
    total_samples = 0

    criterion = nn.CrossEntropyLoss()
    with torch.no_grad():
        for data, labels in data_loader:
          data, labels = data.to(device), labels.to(device)
          output = model(data)
          loss = criterion(output, labels)
          total_loss += loss.item() * data.size(0)
          pred = output.max(1, keepdim=True)[1]
          total_correct += pred.eq(labels.view_as(pred)).sum().item()
          total_samples += data.size(0)

    accuracy = total_correct / total_samples
    average_loss = total_loss / total_samples

    return average_loss, accuracy

1.4 Results¶

1.4.1 Hyperparameter: Batch Size¶

Out of all the hyperparameters that were tuned, the relationship between batch size and learning rate are the most interesting points and useful for projects using ResNet. Typically to speed up training, a larger learning rate or batch size can be chosen. From the hyperparameter tuning, we observed that higher learning rates was not very successful, while increasing batch size had little effect on the metrics, will decreasing training time (more noticeable when ResNet is frozen and pre-computed). The same relationship was observed in blog posts where they brought in the concept of forgetfulness to explain this occurence. Since our training data has ~76,000 images, training with large batches makes sense to speed up our training if we can maintain a similar performance. Given the results of the hyperparameter tuning, we suggest people training ResNet-based projects to use higher batch sizes as long as they have enough memory.

In [ ]:
# Load the batch size results    
bs_data = pd.read_csv('styleCNN_bs.csv')

# Plot validation accuracy vs epochs for the current batch size
fig_batch = go.Figure()

for batch_size in bs_data['batch_size'].unique():
    batch_data = bs_data[bs_data['batch_size'] == batch_size]

    fig_batch.add_trace(go.Scatter(x=batch_data['epoch'], y=batch_data['valid accuracy'],
                             mode='lines', name=f'Batch Size {batch_size}'))

fig_batch.update_layout(title='Validation Accuracy vs Epoch for Each Batch Size',
                  xaxis_title='Epoch',
                  yaxis_title='Validation Accuracy')
fig_batch.update_layout(width=800, height=500)
fig_batch.show()
1.4.2 Tuned Model¶

After various hyperparameter tuning and modification of last fully connected layers, given architecture explained in 1.3 with learning rate 0.00001, clip gradient 1.0, and epoch 4 was selected as the best model. The training curve is given as below.

In [ ]:
model = joblib.load('/best_model_wikiART224_style.joblib')

image.png image-2.png

As can be seen with the training curves, the train accuracy goes up to 80% but the validation and test accuracy is quite low, reaching 58% for the maximum. We introduced a new metric top-3-accuracy to evaluate the model performance. Top-3-accuracy computes the number of times where the correct label is among the top 3 labels predicted. Since there are similar styles and styles which overlap, top-3-accuracy can be a reasonable metric to evaluate the style classification. With top-3-accuracy, the trained model exhibits a reasonably good performance.

In [ ]:
losses = pd.read_csv('style_cnn_kaccuracy.csv', index_col=0) 
losses
Out[ ]:
losses accuracy k accuracy
train 0.588948 0.803671 0.962333
val 1.270694 0.583623 0.865307
test 1.270794 0.587444 0.862867

To further validate the suitability of top-k-accuracy as a metric, we analyzed the probability distribution of the top predictions for images that were misclassified but still contained the true label within the top 3 predictions. The analysis revealed that the misclassified probability hovered around 50%, indicating that even in cases of misclassification, the correct label still often have a relatively high probability among the top predictions.

image.png

The analysis of precision, recall, and F1 score highlights that post-impressionism, expressionism, and symbolism have the lowest F1 scores in the validation set. This outcome primarily arises from the model frequently misclassifying expressionism images as post-impressionism. This confusion is understandable due to the similarities between these two movements, characterized by bold colors and dynamic brushwork, as demonstrated in the provided example. Additionally, the classification of symbolism as an art movement rather than a consistent style results in diverse predictions across various styles, further lowering the F1 score.

In [ ]:
precision_metrics = pd.read_csv('style_cnn_precision_metrics.csv', index_col=0) 
precision_metrics
Out[ ]:
train precision val precision test precision train recall val recall test recall train f1 val f1 test f1
Abstract_Expressionism 0.828008 0.609195 0.662069 0.942947 0.770909 0.683274 0.881748 0.680578 0.672504
Art_Nouveau_Modern 0.826470 0.630178 0.625668 0.758074 0.514493 0.528217 0.790796 0.566489 0.572827
Baroque 0.865530 0.645161 0.643172 0.909251 0.668258 0.703614 0.886852 0.656506 0.672037
Cubism 0.877181 0.712418 0.750000 0.742614 0.473913 0.502041 0.804308 0.569191 0.601467
Expressionism 0.780848 0.501859 0.478689 0.695853 0.397644 0.445802 0.735904 0.443714 0.461660
Impressionism 0.845899 0.680534 0.694570 0.832359 0.676287 0.685778 0.839074 0.678404 0.690146
Naive_Art_Primitivism 0.843149 0.626667 0.662252 0.696721 0.417778 0.452489 0.762973 0.501333 0.537634
Northern_Renaissance 0.932958 0.802326 0.795122 0.908774 0.736655 0.654618 0.920707 0.768089 0.718062
Post_Impressionism 0.639342 0.434426 0.398524 0.756606 0.566412 0.533773 0.693048 0.491716 0.456338
Realism 0.748386 0.514844 0.523988 0.864515 0.660781 0.651445 0.802270 0.578755 0.580806
Rococo 0.823860 0.560000 0.612440 0.870392 0.551724 0.627451 0.846487 0.555831 0.619855
Romanticism 0.852184 0.579023 0.598829 0.813904 0.562064 0.591040 0.832604 0.570418 0.594909
Symbolism 0.861722 0.642361 0.613333 0.644550 0.405702 0.379381 0.737480 0.497312 0.468790

image.png

1.5 Implementation on new data¶

Various images were tesetd and the results are as below.

image.png

1.6 Image Segmentation¶

Many literatures reported that augmentating the fine art images can improve the style classification model. Therefore, WikiART9 dataset wich is segmented into 9 pieces was tried out for training to further improve the style classification model. The data loading, model architecture and training process was same as the wikiART224 dataset explained above.

image-3.png

However, the prediction accuracy was 58.7%, similar with the model trained with wikiART224. Although the model is not significantly better, if this model can be combined with another model which predicts the final style from probabilities obtained from our model, there is a probability that the result is better.

In [ ]:
best_model_wikiART9 = joblib.load('best_model_wikiART9_style.joblib')

image.png

2. Genre Classification Model¶

In [ ]:
# Import packages
import torch
import torchvision
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms, models
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
import seaborn as sns
import pandas as pd
import os
import zipfile
from torch.utils.data import SubsetRandomSampler, DataLoader
from PIL import Image
from tqdm import tqdm
import csv
import random

2.1 Data loading¶

In [ ]:
def custom_dataloader(data_dir, batch_size=64, num_workers=1):

    '''
    input:
    data_dir: file path of input data. raw_dataset are created from that file
    batch_size
    num_workers
    output:
    train_loader, val_loader, test_loader
    -----
    data is transformed by data_transform. train:val:test = 0.8:0.1:0.1 with each classes equally splitted
    '''
    transform = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
    
    raw_dataset = datasets.ImageFolder(root=data_dir, transform=transform)
    np.random.seed(1000) # Fixed numpy random seed for reproducible shuffling
    indices = np.arange(len(raw_dataset))
    np.random.shuffle(indices)
    train_split = int(len(indices) * 0.8)
    # dividing by 2 will assign 10% to val and 10% to test
    # if the train is 0.8
    testval_split = train_split + int(len(indices) * (1 - 0.8)/2)
    # split into training and validation indices
    relevant_train_indices, relevant_val_indices,test_indices = indices[:train_split], indices[train_split:testval_split] ,indices[testval_split:]
    train_sampler = SubsetRandomSampler(relevant_train_indices)
    train_loader = torch.utils.data.DataLoader(raw_dataset, batch_size=batch_size,
                                             num_workers=num_workers, sampler=train_sampler)
    val_sampler = SubsetRandomSampler(relevant_val_indices)
    val_loader = torch.utils.data.DataLoader(raw_dataset, batch_size=batch_size,
                                            num_workers=num_workers, sampler=val_sampler)
    test_sampler = SubsetRandomSampler(test_indices)
    test_loader = torch.utils.data.DataLoader(raw_dataset, batch_size=batch_size,
                                           num_workers=num_workers, sampler=test_sampler)
    print(f'data loading completed')
    return train_loader, val_loader, test_loader

# Function to display images from a batch
def show_images(images, labels, nrows, ncols):
    fig, axes = plt.subplots(nrows, ncols, figsize=(10, 10))

    for i, ax in enumerate(axes.flat):
        # Display image
        ax.imshow(np.transpose(images[i], (1, 2, 0)))
        ax.set_title(f"Label: {labels[i]}")
        ax.axis('off')

    plt.tight_layout()
    plt.show()
In [ ]:
# Modify the dataset_dir variable according to your local directory structure
dataset_dir = 'C:/Users/ASUS/Documents/UofT MEng/Winter 2023-2024/MIE1517 Introduction to Deep Learning/Course project/Data/wikiART224_genre'

# Call the custom_dataloader function to create train, validation, and test data loaders
train_loader, val_loader, test_loader = custom_dataloader(data_dir=dataset_dir)
data loading completed
In [ ]:
# Display a few images from the train loader
images, labels = next(iter(train_loader))
show_images(images, labels, nrows=4, ncols=4)
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
No description has been provided for this image

The following transformation was included in the custom_dataloader code to preprocess the images before they are fed into the model. This transform consists of the following operations:

  • Resize: Resizes the images to a fixed size of 224x224 pixels. This ensures that all images have the same dimensions.

  • ToTensor: Converts the images into PyTorch tensors.

  • Normalize: Normalizes the pixel values of the images. This step subtracts the mean (0.5, 0.5, 0.5) and divides by the standard deviation (0.5, 0.5, 0.5) for each color channel (RGB).

These transformations ensure that the images are preprocessed consistently and appropriately for the model. As a result, the appearance of the images may be different when viewed directly, and appear darker than they are originally.

2.2 Model Architecture¶

The following cell contains the architecture for the best-performing model among all the architectures that were tried and trained.

In [ ]:
# Load pre-trained ResNet model
resnet = models.resnet101(pretrained=True)
resnet.name = 'ResNet101'

# Freeze parameters in ResNet architecture
for param in resnet.parameters():
    param.requires_grad = False

# Modify the top layer
resnet.fc = nn.Sequential(
    nn.Linear(resnet.fc.in_features, 256),
    nn.ReLU(),
    nn.Linear(256, 128),
    nn.ReLU(),
    nn.Linear(128, 66),
    nn.ReLU(),
    nn.Linear(66, 11))

The genre classification model employed ResNet101 as its backbone architecture. At the end of the ResNet101 model, fully connected layers with sizes of 256, 128, 66, and a final layer with a size adjusted to the number of classes in the dataset (fc11) were added for classification purposes.

image.png

2.3 Model Training¶

The training code along with utility functions are listed below.

In [ ]:
def get_model_name(name, batch_size, epoch):
    """ Generate a name for the model consisting of all the hyperparameter values

    Args:
        config: Configuration object containing the hyperparameters
    Returns:
        path: A string with the hyperparameter name and value concatenated
    """
    path = "model_{0}_bs{1}_epoch{2}".format(name,batch_size,epoch)
    return path
In [ ]:
# Function to evaluate the model
def evaluate_model(model, criterion, data_loader, num_classes, k=3):
    """ 
    Inputs:
    - model: model for genre classification
    - criterion:
    - data_loader: data_loader used to evaluate the model on
    - num_classes: number of classes in the data_loader
    - k: number of top accuracies to produce accuracy@k
    
    Outputs:
    - running_loss/len(data_loader): average loss per iteration
    - correct/total: accuracy
    - correct_at_k / total: accuracy@k
    - precisions: list of precision scores for each class
    - recalls: list of recall scores for each class
    - f1_scores: list of F1 scores for each classs
    """
    
    model.eval()
    
    #############################################
    # To Enable GPU Usage
    if use_cuda and torch.cuda.is_available():
        model.cuda()
    ############################################# 
    
    true_positives = [0] * num_classes
    false_positives = [0] * num_classes
    false_negatives = [0] * num_classes
    correct = 0
    total = 0
    correct_at_k = 0
    running_loss = 0.0
    
    for imgs, labels in data_loader:

        #############################################
        #To Enable GPU Usage
        if use_cuda and torch.cuda.is_available():
            imgs = imgs.cuda()
            labels = labels.cuda()
        #############################################
        
        output = model(imgs)
        loss = criterion(output, labels) # Compute the total loss
        running_loss += loss.item()  # Add the loss to the running_loss
        
        # Get top-k predictions
        _, pred_topk = output.topk(k, dim=1)
        
        #select index with maximum prediction score
        preds = output.max(1, keepdim=True)[1]
        correct += preds.eq(labels.view_as(preds)).sum().item()
        
        # Check if true label is in top-k predictions
        for i in range(len(labels)):
            if labels[i] in pred_topk[i]:
                correct_at_k += 1
                
        total += imgs.shape[0]
        
        # Update counts of true positives, false positives, and false negatives
        for pred, label in zip(preds, labels):
            if pred == label:
                true_positives[pred] += 1
            else:
                false_positives[pred] += 1
                false_negatives[label] += 1
      
    precisions = []
    recalls = []
    f1_scores = []
    for i in range(num_classes):
        tp = true_positives[i]
        fp = false_positives[i]
        fn = false_negatives[i]

        precision = tp / (tp + fp + 1e-9)
        recall = tp / (tp + fn + 1e-9)
        
        precisions.append(precision)
        recalls.append(recall)
        f1_scores.append(2 * (precision * recall) / (precision + recall + 1e-9))      
        
    return running_loss/len(data_loader), correct / total, correct_at_k / total, precisions, recalls, f1_scores
In [ ]:
def train(model, dataset_dir, batch_size=64, learning_rate=0.001, num_epochs=10, start_epoch=0):

    #############################################
    # To Enable GPU Usage
    if use_cuda and torch.cuda.is_available():
        model.cuda()
    #############################################

    # Instantiate data_loader for training and validation datasets
    train_loader, val_loader, _ = get_data_loaders(dataset_dir, batch_size)

    # Set criterion to CE
    criterion = nn.CrossEntropyLoss()

    # Set optimizer to Adam
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    # Instantiate empty lists to store iterations, losses, and training and validation accuracies
    train_results = {}
    train_results['epochs'] = []
    train_results['loss'] = []
    train_results['accuracy'] = []
    train_results['accuracy@3'] = []
    train_results['precision'] = []
    train_results['recall'] = []
    train_results['F1_score'] = []
    
    val_results = {}
    val_results['epochs'] = []
    val_results['loss'] = []
    val_results['accuracy'] = []
    val_results['accuracy@3'] = []
    val_results['precision'] = []
    val_results['recall'] = []
    val_results['F1_score'] = []

    # Training
    print('Training has started!')
    epoch_idx = 0
    
    # Iterate through epochs
    for epoch in range(num_epochs):
        # Set running_loss to 0.0
        running_loss = 0.0
        i = 0

        # Create tqdm progress bar for training batches
        loop = tqdm(train_loader, desc=f'Epoch {epoch}', leave=False)

        for imgs, labels in loop:
            #############################################
            # To Enable GPU Usage
            if use_cuda and torch.cuda.is_available():
                imgs = imgs.cuda()
                labels = labels.cuda()
            #############################################

            out = model(imgs)             # Forward pass
            loss = criterion(out, labels) # Compute the total loss
            loss.backward()               # Backward pass (compute parameter updates)
            optimizer.step()              # Make the updates for each parameter
            optimizer.zero_grad()         # A clean-up step for PyTorch
            running_loss += loss.item()  # Add the loss to the running_loss

            loop.set_description(f"Epoch [{epoch+1}/{num_epochs}]") # Set description of progress bar
            loop.set_postfix(loss=running_loss/(i+1))
            i += 1

        if epoch % 2 == 0:
            # Save the training information per epoch
            train_loss, train_acc, train_acc_k3, train_prec, train_rec, train_f1 = evaluate_model(model, criterion, train_loader, len(train_loader.dataset.classes), k=3)
            train_results['epochs'].append(epoch + start_epoch)
            train_results['loss'].append(train_loss)
            train_results['accuracy'].append(train_acc)
            train_results['accuracy@3'].append(train_acc_k3)
            train_results['precision'].append(train_prec)
            train_results['recall'].append(train_rec)
            train_results['F1_score'].append(train_f1)  

            val_loss, val_acc, val_acc_k3, val_prec, val_rec, val_f1 = evaluate_model(model, criterion, val_loader, len(val_loader.dataset.classes), k=3)
            val_results['epochs'].append(epoch + start_epoch)
            val_results['loss'].append(val_loss)
            val_results['accuracy'].append(val_acc)
            val_results['accuracy@3'].append(val_acc_k3)
            val_results['precision'].append(val_prec)
            val_results['recall'].append(val_rec)
            val_results['F1_score'].append(val_f1)
            
            # Save the current model (checkpoint) to a file at every other epoch
            model_path = get_model_name(model.name, batch_size, epoch + start_epoch)
            torch.save(model.state_dict(), model_path)

            # Print Training and validation accuracy at each epoch
            print(f'Epoch {epoch}-Training accuracy: {train_results["accuracy"][epoch_idx]}-Validation accuracy: {val_results["accuracy"][epoch_idx]}-Training loss: {train_results["loss"][epoch_idx]}-Validation loss: {val_results["loss"][epoch_idx]}')
            epoch_idx += 1


    print('Training done!')

    return train_results, val_results

2.4 Model Results¶

In [ ]:
# Import training results from 'final_report_data_corrected_loss.csv'
model_results_df = pd.read_csv('final_report_data_corrected_loss.csv')

# Separate training and validation results
model_train_results = model_results_df.loc[model_results_df['dataset']=='training']
model_val_results = model_results_df.loc[model_results_df['dataset']=='validation']
2.4.1 Learning curve¶
In [ ]:
# Set the style of the plot
sns.set(style="whitegrid")

# Plot Learning Curves
plt.figure(figsize=(12, 6))

# Plot Learning Curve for Loss
plt.subplot(1, 2, 1)
sns.lineplot(x='epochs', y='loss', data=model_train_results, label='Training Loss')
sns.lineplot(x='epochs', y='loss', data=model_val_results, label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Loss Learning Curve')
plt.legend()

# Plot Learning Curve for Accuracy
plt.subplot(1, 2, 2)
sns.lineplot(x='epochs', y='accuracy', data=model_train_results, label='Training Accuracy')
sns.lineplot(x='epochs', y='accuracy', data=model_val_results, label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.title('Accuracy Learning Curve')
plt.legend()

plt.tight_layout()
plt.show()
No description has been provided for this image

In the experimentation process, various hyperparameter settings were explored to optimize the model's performance. This involved adjusting parameters such as batch size, regularization through dropout layers, and learning rates. After conducting rigorous experimentation, it was determined that using a batch size of 64 and a learning rate of 0.001, alongside the provided model architecture outlined in section 2.2, yielded the most promising results in terms of accuracy.

2.4.2 Test Results¶
In [ ]:
# Load best model state
model_path = 'model_ResNet101_bs64_lr0.001_epoch8'
state = torch.load(model_path)
resnet.load_state_dict(state)
Out[ ]:
<All keys matched successfully>
In [ ]:
# Evaluate model
use_cuda = True

# Evaluate model on train_loader
train_loss, train_acc, train_acc_at3, train_precision, train_recall, train_f1_score = evaluate_model(resnet, nn.CrossEntropyLoss(), train_loader, len(train_loader.dataset.classes))
# Evaluate model on val_loader
val_loss, val_acc, val_acc_at3, val_precision, val_recall, val_f1_score = evaluate_model(resnet, nn.CrossEntropyLoss(), val_loader, len(val_loader.dataset.classes))
# Evaluate model on test_loader
test_loss, test_acc, test_acc_at3, test_precision, test_recall, test_f1_score = evaluate_model(resnet, nn.CrossEntropyLoss(), test_loader, len(test_loader.dataset.classes))
2.4.3 Accuracy and accuracy@3¶
In [ ]:
# Create a DataFrame to store the metrics of the best model
best_model_acc = pd.DataFrame({'accuracy':[train_acc,val_acc,test_acc],
                               'k accuracy':[train_acc_at3, val_acc_at3, test_acc_at3]},
                              index = ['training','validation','test'])

# Display the metrics DataFrame
display(best_model_acc)
accuracy k accuracy
training 0.654674 0.923285
validation 0.614224 0.894620
test 0.610685 0.895869

Accuracy at 3 (k accuracy) is particularly valuable in fine arts genre classification due to the inherent ambiguity and blurred boundaries between genres. In fine arts, genres are often not distinct categories but rather fluid and overlapping concepts. Artworks may exhibit characteristics of multiple genres simultaneously, making them challenging to classify into a single category.

For example, consider a painting that depicts both a landscape and a portrait. Such artworks blur the edges between genres, making it difficult to assign them to a single category. In this context, accuracy at 3 becomes crucial as it allows for a more nuanced evaluation of the model's performance. Instead of expecting the model to predict a single genre with absolute certainty, accuracy at 3 measures the proportion of correct predictions within the top 3 predicted genres. This metric acknowledges the complexity of fine arts classification and provides a more realistic assessment of the model's ability to capture the subtle nuances and interconnections between genres.

2.4.4 Precision, Recall, and F1 score¶
In [ ]:
# Store list of classes in test_loader as 'classes'
classes = test_loader.dataset.classes

# Create a DataFrame to store the metrics of the best model
best_model_metrics = pd.DataFrame({'train precision':train_precision,
                                   'val precision':val_precision,
                                   'test precision': test_precision,
                                   'train recall':train_recall,
                                   'val recall':val_recall,
                                   'test recall':test_recall,
                                   'train F1 score':train_f1_score,
                                   'val F1 score':val_f1_score,
                                   'test F1 score':test_f1_score},
                                  index=classes)

# Display the metrics DataFrame
display(best_model_metrics)
train precision val precision test precision train recall val recall test recall train F1 score val F1 score test F1 score
Unknown 0.561092 0.506564 0.509245 0.453305 0.413098 0.403049 0.501472 0.455082 0.449966
abstract_painting 0.743299 0.720755 0.740809 0.824805 0.776423 0.791749 0.781933 0.747554 0.765432
cityscape 0.738296 0.684358 0.679525 0.532374 0.502049 0.489316 0.618650 0.579196 0.568944
genre_painting 0.587552 0.554054 0.512644 0.466363 0.408145 0.430918 0.519990 0.470036 0.468241
illustration 0.769958 0.631068 0.592920 0.476283 0.380117 0.348958 0.588519 0.474453 0.439344
landscape 0.697008 0.666280 0.670343 0.879768 0.861423 0.859387 0.777796 0.751388 0.753184
nude_painting 0.679632 0.555556 0.580952 0.425914 0.380435 0.348571 0.523659 0.451613 0.435714
portrait 0.728293 0.707663 0.702737 0.813639 0.778329 0.786325 0.768604 0.741315 0.742185
religious_painting 0.644478 0.579495 0.578488 0.655511 0.588235 0.590504 0.649948 0.583832 0.584435
sketch_and_study 0.517047 0.486188 0.503484 0.728603 0.717391 0.688095 0.604860 0.579583 0.581489
still_life 0.604190 0.532967 0.525424 0.742883 0.695341 0.628378 0.666396 0.603421 0.572308
2.4.5 Confusion matrix¶
In [ ]:
#Create lists to store labels and predictions
all_labels = []
all_predictions = []

# Iterate over the test loader
for imgs, labels in test_loader:
    
    #############################################
    # To Enable GPU Usage
    if use_cuda and torch.cuda.is_available():
        imgs = imgs.cuda()
        labels = labels.cuda()
    #############################################

    # Forward pass
    output = resnet(imgs)

    # Select index with maximum prediction score
    pred = output.max(1, keepdim=True)[1]

    # Convert tensors to numpy arrays and append to lists
    all_labels.extend(labels.cpu().numpy())
    all_predictions.extend(pred.cpu().numpy())
In [ ]:
# Convert lists to numpy arrays
all_labels = np.array(all_labels)
all_predictions = np.array(all_predictions)

# List of all target classes
classes = os.listdir(dataset_dir)

# Calculate confusion matrix
conf_matrix = confusion_matrix(all_labels, all_predictions)

# Convert confusion matrix to DataFrame for better visualization
conf_matrix_df = pd.DataFrame(conf_matrix, index=classes, columns=classes)

# Plot confusion matrix heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(conf_matrix_df, annot=True, fmt="d", cmap="Blues")
plt.title("Confusion Matrix (Test set)")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.tight_layout()
plt.show()
No description has been provided for this image

For some classes, such as "Unknown," "genre_painting," and "illustration," the precision, recall, and F1 scores are relatively low. This could be due to several reasons:

  • Ambiguity and Overlap: Certain genres, like "genre_painting" and "illustration", exhibit significant overlap with other genres, making them inherently challenging to distinguish. The model may struggle to correctly classify artworks that blur the boundaries between these genres.
  • Complexity of Artistic Styles: Fine arts genres can encompass a wide range of artistic styles and interpretations, making them inherently subjective and difficult to categorize. The model may have struggled to capture the nuanced characteristics of certain genres, leading to lower classification accuracy.

Conversely, classes like "landscape," "portrait," and "abstract_painting" exhibit higher precision, recall, and F1 scores. This could be attributed to several factors:

  • Distinctive Features: These genres have more distinct visual features or motifs that make them easier to classify accurately. For example, "landscape" paintings feature recognizable natural landscapes, while "portrait" paintings focus on depicting individuals.
  • Consistency in Style: These genres have more consistent artistic styles or conventions across artworks, making them easier for the model to learn and distinguish.

2.5 Implementation on new data¶

image.png

image.png

image.png

image.png

3. Artwork Recommender¶

3.1 Import dataset and Train¶

In order to recommend the image with detailed description that are similar to the given art pieces, the team leverage Semart dataset.

In [ ]:
# Adjust the paths to where your datasets are located
train_df = pd.read_csv('./SemArt/semart_test.csv', sep='\t', encoding='ISO-8859-1')
val_df = pd.read_csv('./SemArt/semart_val.csv', sep='\t', encoding='ISO-8859-1')
test_df = pd.read_csv('./SemArt/semart_test.csv', sep='\t', encoding='ISO-8859-1')
In [ ]:
all_data_df = pd.concat([train_df, val_df, test_df], ignore_index=True)
In [ ]:
all_data_df
IMAGE_FILE DESCRIPTION AUTHOR TITLE TECHNIQUE DATE TYPE SCHOOL TIMEFRAME
0 41294-10ladisl.jpg Of the Hungarian kings St Ladislas is perhaps ... UNKNOWN MASTER, Hungarian Saint Ladislaus, King of Hungary Oil on wood, 103 x 101,3 cm c. 1600 religious Hungarian 1551-1600
1 42791-1sacris.jpg This ceiling painting in the sacristy of San S... VERONESE, Paolo Coronation of the Virgin Oil on canvas, 200 x 170 cm 1555 religious Italian 1551-1600
2 14376-worship.jpg In the same period when the most talented arti... FRANCKEN, Frans II Worship of the Golden Calf Oil on panel, 60 x 88 cm - religious Flemish 1601-1650
3 24776-annuncia.jpg Based on its style the Annunciation is attribu... MASTER of Flémalle Annunciation Tempera on oak, 61 x 63,7 cm 1420s religious Flemish 1401-1450
4 23845-3manet04.jpg The 1870s were rich in female models for Manet... MANET, Edouard Brunette with Bare Breasts Oil on canvas, 60 x 49 cm c. 1872 portrait French 1851-1900
... ... ... ... ... ... ... ... ... ...
3202 08082-gondola.jpg In this painting Carus also shows the figures ... CARUS, Carl Gustav A Gondola on the Elbe near Dresden Oil on canvas, 29 x 22 cm 1827 landscape German 1801-1850
3203 32349-17ignazi.jpg The effect of the simulated cupola rests large... POZZO, Andrea Painting on the pendentive: Samson Fresco 1685 religious Italian 1651-1700
3204 35839-valkhof1.jpg The picture shows the Valkhof at Nijmegen with... RUYSDAEL, Salomon van The Valkhof at Nijmegen Oil on canvas, 73 x 103 cm 1650s landscape Dutch 1601-1650
3205 40789-crucifix.jpg This unusually violent Crucifix was probably t... UNKNOWN MASTER, Bohemian Crucifixion Panel, 67 x 30 cm c. 1360 religious Bohemian 1351-1400
3206 22274-abductio.jpg This panel belonged to a cassone from which an... LIBERALE da Verona The Abduction of Helen of Troy Oil on poplar panel, 41 x 110 cm c. 1470 historical Italian 1451-1500

3207 rows × 9 columns

In [ ]:
all_data_df.to_csv("./SemArt/semart_desc_all.csv", index=False, sep='\t', encoding='utf-8')
In [ ]:
all_data_df.iloc[0]['IMAGE_FILE']
'41294-10ladisl.jpg'

Train code

In [ ]:
class AttrDict(dict):
    def __init__(self, *args, **kwargs):
        super(AttrDict, self).__init__(*args, **kwargs)
        self.__dict__ = self

def train(args, gen=None):

    # Numpy random seed
    npr.seed(args.seed)

    # Save directory
    save_dir = "outputs/" + args.experiment_name

    # LOAD THE MODEL
    if gen is None:
        Net = globals()[args.model]
        gen = Net(args.kernel, args.num_filters)

    # LOSS FUNCTION
    criterion = nn.MSELoss()
    optimizer = torch.optim.Adam(gen.parameters(), lr=args.learn_rate)

    # DATA
    print("Loading data...")
    _, train_loader, _, _, _ = cd.get_data_loader("./SemArt/all_image",batch_size=args.batch_size,resize=args.resize)

    #print("Transforming data...")
    #train_rgb, train_grey = process(x_train, y_train, downsize_input=args.downsize_input)
    #test_rgb, test_grey = process(x_test, y_test, downsize_input=args.downsize_input)

    # Create the outputs folder if not created already
    if not os.path.exists(save_dir):
        os.makedirs(save_dir)

    print("Beginning training ...")
    if args.gpu:
        gen.cuda()

    for epoch in range(args.epochs):
        # Train the Model
        gen.train()  # Change model to 'train' mode
        losses = []
        for i, data in enumerate(train_loader,0):
            inputs, _,_,_= data

            #############################################
            #To Enable GPU Usage
            if torch.cuda.is_available():
              inputs = inputs.cuda()
            #############################################

            # Forward + Backward + Optimize
            optimizer.zero_grad()
            outputs = gen(inputs)

            loss = criterion(outputs, inputs)
            loss.backward()
            optimizer.step()
            losses.append(loss.data.item())

        print(epoch, loss.cpu().detach())
        #if epoch%5 == 0 and args.plot:
        visualize(inputs, outputs, args.gpu, 1)
        # Save the model state dictionary
        model_save_path = os.path.join(save_dir, f'model_epoch_{epoch}.pth')
        torch.save(gen.state_dict(), model_save_path)

    return gen

3.2 Model Architecture¶

In order to extract the encoder from the Autoencoder I will try to have two separate class to edfine encoder and decoder. Then another class to assemble this structure. As I discuss with professor lot of skip connection mmight be useful for regeneration but this may not necessarily translate into better feature space. so I will use just one skip connection at the start of the decoder this has helped the autoencoder to be good but still don't have too much hint.

In [ ]:
class UNetEncoder(nn.Module):
    def __init__(self, kernel, num_filters, num_in_channels=3):
        super(UNetEncoder, self).__init__()
        stride = 2
        padding = kernel // 2

        self.downconv1 = nn.Sequential(
            nn.Conv2d(num_in_channels, num_filters, kernel_size=kernel, padding=padding),
            nn.BatchNorm2d(num_filters),
            nn.ReLU(),
            nn.MaxPool2d(2))
        
        self.downconv2 = nn.Sequential(
            nn.Conv2d(num_filters, num_filters*2, kernel_size=kernel, padding=padding),
            nn.BatchNorm2d(num_filters*2),
            nn.ReLU(),
            nn.MaxPool2d(2))
        
        self.downconv3 = nn.Sequential(
            nn.Conv2d(num_filters*2, num_filters*4, kernel_size=kernel, padding=padding),
            nn.BatchNorm2d(num_filters*4),
            nn.ReLU(),
            nn.MaxPool2d(2)
            )
        
        self.rfconv = nn.Sequential(
            nn.Conv2d(num_filters*4, num_filters*4, kernel_size=kernel, padding=padding),
            nn.BatchNorm2d(num_filters*4),
            nn.ReLU())

    def forward(self, x):
        x1 = self.downconv1(x)
        x2 = self.downconv2(x1)
        x3 = self.downconv3(x2)
        x_rf = self.rfconv(x3)
        return x1, x2, x3, x_rf
In [ ]:
class UNetDecoder_simp(nn.Module):
    def __init__(self, kernel, num_filters, num_colours=3, num_in_channels=3):
        super(UNetDecoder_simp, self).__init__()
        padding = kernel // 2

        self.upconv1 = nn.Sequential(
            nn.ConvTranspose2d(num_filters*4, num_filters*2, kernel_size=kernel, stride=2, padding=padding, output_padding=1),
            nn.BatchNorm2d(num_filters*2),
            nn.ReLU())

        self.upconv2 = nn.Sequential(
            nn.ConvTranspose2d(num_filters*4, num_filters, kernel_size=kernel, stride=2, padding=padding, output_padding=1),
            nn.BatchNorm2d(num_filters),
            nn.ReLU())

        self.upconv3 = nn.Sequential(
            nn.ConvTranspose2d(num_filters, num_filters, kernel_size=kernel, stride=2, padding=padding, output_padding=1),
            nn.BatchNorm2d(num_filters),
            nn.ReLU())

        self.finalconv = nn.Conv2d(num_filters, num_colours, kernel_size=kernel, padding=padding)

    def forward(self, x1, x2, x3, x_rf, original_x):
        x_up1 = self.upconv1(x_rf)
        x_up1_skip = torch.cat([x_up1, x2], dim=1)
        x_up2 = self.upconv2(x_up1_skip)
        x_up3 = self.upconv3(x_up2)
        out = self.finalconv(x_up3)
        return out
In [ ]:
class UNet_simp(nn.Module):
    def __init__(self, kernel, num_filters, num_colours=3, num_in_channels=3):
        super(UNet_simp, self).__init__()
        self.encoder = UNetEncoder(kernel, num_filters, num_in_channels)
        self.decoder = UNetDecoder_simp(kernel, num_filters, num_colours, num_in_channels)

    def forward(self, x):
        x1, x2, x3, x_rf = self.encoder(x)
        out = self.decoder(x1, x2, x3, x_rf, x)
        return out

This is one of the instance of the training model. Due to computatioanl limitation minimal hyperparameter tuning was done. mostly decreasing model complexity to overcome overfitting.

In [ ]:
# Total training time 102min
args_dict = {
    "gpu": True,
    "valid": False,
    "checkpoint": "",
    "model": "UNet_simp",
    "kernel": 5,
    "num_filters": 10,
    'learn_rate':0.01,
    "batch_size": 64,
    "epochs": 3,
    "seed": 0,
    "resize": transforms.Resize((512,512)),
    "plot": True,
    "experiment_name": "UNet_SEMART",
    "visualize": False,
    "downsize_input": False,
}
Loading data...
Beginning training ...
0 tensor(0.0037)
No description has been provided for this image
1 tensor(0.0051)
No description has been provided for this image
2 tensor(0.0026)
No description has been provided for this image

In the case of training autoencoder, various hyperparamter was explored. adding more skip connection didn't helped. shrink size of the model size itself which helped while donig a clustering. Minor epoch, learning rate and batchsize was explored too. Increasing size of the image hugely impact training time which was around 1 hour and 30 min which can be explored as a future work.

3.3 Using art recommendation system¶

3.3.1 Load models¶

Kmeans, Unet, and embeddings

In [ ]:
import pickle

# Replace 'your_file.pkl' with the path to your pickle file
pickle_file_path = './outputs/encoded_features_desc_w_Cluster_unique.pkl'

with open(pickle_file_path, 'rb') as file:
    data = pickle.load(file)

df_cluster = pd.DataFrame(data)
In [ ]:
from joblib import dump, load

# Assuming `kmeans` is your trained KMeans model
model_filename = './outputs/kmeans_model.joblib'
kmeans = load(model_filename)
In [ ]:
_, train_loader_wiki, val_loader_wiki, test_loader_wiki, _ = cd.get_data_loader("./wikiART224",resize=transforms.Resize((224,224)),batch_size=1,normailze = True)
print('training examples: ',len(train_loader_wiki))
print('validation examples: ',len(val_loader_wiki))
print('testing examples: ', len(test_loader_wiki))
training examples:  65153
validation examples:  8144
testing examples:  8145
In [ ]:
df_cluster
IMAGE_FILE DESCRIPTION AUTHOR TITLE TECHNIQUE DATE TYPE SCHOOL TIMEFRAME Encoded Features Cluster
0 41294-10ladisl.jpg Of the Hungarian kings St Ladislas is perhaps ... UNKNOWN MASTER, Hungarian Saint Ladislaus, King of Hungary Oil on wood, 103 x 101,3 cm c. 1600 religious Hungarian 1551-1600 [0.66586727, 0.51203525, 0.0, 0.0, 0.0, 0.0, 0... 0
1 42791-1sacris.jpg This ceiling painting in the sacristy of San S... VERONESE, Paolo Coronation of the Virgin Oil on canvas, 200 x 170 cm 1555 religious Italian 1551-1600 [0.4588764, 0.31383383, 0.30628043, 0.316805, ... 10
2 14376-worship.jpg In the same period when the most talented arti... FRANCKEN, Frans II Worship of the Golden Calf Oil on panel, 60 x 88 cm - religious Flemish 1601-1650 [0.45856556, 0.32140628, 0.31562173, 0.3044343... 10
3 24776-annuncia.jpg Based on its style the Annunciation is attribu... MASTER of Flémalle Annunciation Tempera on oak, 61 x 63,7 cm 1420s religious Flemish 1401-1450 [0.0, 0.0, 0.0, 0.33191314, 0.44521868, 0.7014... 6
4 23845-3manet04.jpg The 1870s were rich in female models for Manet... MANET, Edouard Brunette with Bare Breasts Oil on canvas, 60 x 49 cm c. 1872 portrait French 1851-1900 [0.61417955, 0.56172717, 0.60303986, 0.5860305... 4
... ... ... ... ... ... ... ... ... ... ... ...
2133 28424-winter.jpg This signed painting depicts a winter landscap... MOLENAER, Klaes Winter Landscape Oil on oak panel, 39 x 33 cm - landscape Dutch 1651-1700 [0.75753313, 0.62387294, 0.5723128, 0.4696973,... 8
2134 14184-08bolt.jpg At first sight, this painting describing the f... FRAGONARD, Jean-Honoré The Bolt Oil on canvas, 73 x 93 cm c. 1777 genre French 1751-1800 [0.5883344, 0.5935383, 0.6462424, 0.6168197, 0... 5
2135 19609-nordlin1.jpg This is the outside right wing of the high alt... HERLIN, Friedrich Family of the Founder Jakob Fuchsart Wood, 89 x 66 cm 1462-65 religious German 1451-1500 [0.5969873, 0.53667825, 0.6016903, 0.6184438, ... 11
2136 35406-08assum.jpg The Antwerp Cathedral was given a new marble h... RUBENS, Peter Paul Assumption of the Virgin Oil on panel, 490 x 325 cm 1626 religious Flemish 1601-1650 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... 19
2137 28428-winterl2.jpg This painting depicts a winter landscape with ... MOLENAER, Klaes Winter Landscape Oil on oak panel, 37 x 49 cm - landscape Dutch 1651-1700 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... 13

2138 rows × 11 columns

In [ ]:
from scipy.spatial.distance import cdist


model_unet = UNet_simp(kernel=3, num_filters=32, num_colours=3, num_in_channels=3)
# Ensure you're loading the model on the correct device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Load the state dictionary
model_state_dict = torch.load('./outputs/UNet_SEMART/model_epoch_2.pth', map_location=device)

# Update the model with the loaded state dictionary
model_unet.load_state_dict(model_state_dict)
<All keys matched successfully>
3.3.2 Get Helper code¶

To do this first I have to clusternew image to get the cluster. Get the cluster and compare point within that cluster then get the closest three.

In [ ]:
def get_top_similar_file(new_point):
    cluster_label = kmeans.predict(new_point)[0]
    
    # Extract points belonging to the same cluster as the new point
    #same_cluster_indices = np.where(labels == cluster_label)[0]
    same_cluster_df = df_cluster[df_cluster["Cluster"] == cluster_label]
    # Assuming encoded features are stored as lists; convert them to numpy arrays
    cluster_encoded_features = np.stack(same_cluster_df['Encoded Features'].values)

    # Calculate distances from the new point to each point in the same cluster
    distances = cdist(new_point.reshape(1, -1), cluster_encoded_features, metric='cosine').flatten()

    # Get the indices of the top 3 smallest distances
    top_3_indices = distances.argsort()[:3]

    # Retrieve relevant information for the top 3 closest points
    top_3_info = same_cluster_df.iloc[top_3_indices]

    print("Top 3 similar images to the new point are:", top_3_info["IMAGE_FILE"])

    return top_3_info["IMAGE_FILE"],top_3_info
In [ ]:
def encode_image(img,model):
    if torch.cuda.is_available():
        inputs = img.cuda()
        model = model.cuda()

    with torch.no_grad():  # Ensuring no gradients are computed
        outputs = model.encoder(inputs)

    vec = outputs[-1]  # This selects the last tensor from the tuple
    # Flatten the selected tensor to 1D and move it to CPU
    vec = vec.view(vec.size(0), -1).cpu().numpy()
    return vec
In [ ]:
def find_and_display_image(image_file_name, search_directory = "./SemArt/all_image/Images"):
    """
    Searches for an image file in the specified directory and its subdirectories,
    and displays the image if found.
    
    Parameters:
    - image_file_name: The name of the image file to find (e.g., 'example.jpg').
    - search_directory: The root directory to start the search from.
    """
    found = False
    
    # Walk through all directories and files within the search directory
    for root, dirs, files in os.walk(search_directory):
        if image_file_name in files:
            # Construct the full path to the image
            image_path = os.path.join(root, image_file_name)
            print(f"Image found: {image_path}")
            
            # Load and display the image
            img = Image.open(image_path)
            plt.imshow(img)
            plt.axis('off')  # Hide axes
            plt.show()
            
            found = True
            break  # Exit the loop once the image is found and displayed
    
    if not found:
        print("Image not found.")
3.3.3 Using this system¶

Example of implementation on new data. This autoencoder was trained with semart to get description. So later cell will be showing use cases for the wikiart data. This code below will recommend top 3 image that are similar and its description.

In [ ]:
counter = 0
# assuming we only input one Image where batch size  = 1
for images, labels, filenames, sublabels in test_loader_wiki:
    print(f"================{counter}=================")
    print(f"Image Input to {filenames[0]}")
    # Rearrange the axes from (1, 3, 224, 224) to (224, 224, 3)
    img = np.transpose(images, (0, 2, 3, 1)).squeeze(0)

    # Display the image
    plt.imshow(img)
    plt.axis('off')  # Optionally remove the axis
    plt.show()

    emb = encode_image(images,model_unet)
    top_3_filenames, infos = get_top_similar_file(emb)
    print(infos)
    for fn in top_3_filenames:
        find_and_display_image(fn)
    if counter == 2:
        break
    counter +=1
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
================0=================
Image Input to claude-monet_poplars-on-the-banks-of-the-river-epte-seen-from-the-marsh-1892.jpg
No description has been provided for this image
Top 3 similar images to the new point are: 527     04775-landsca2.jpg
169     21079-birchtre.jpg
1728    04486-river_la.jpg
Name: IMAGE_FILE, dtype: object
              IMAGE_FILE                                        DESCRIPTION  \
527   04775-landsca2.jpg  This painting is the pendant of the Landscape ...   
169   21079-birchtre.jpg  Klodt is considered to be a major figure in Ru...   
1728  04486-river_la.jpg  The painting shows a river landscape with fish...   

                              AUTHOR                  TITLE  \
527           BLOEMEN, Jan Frans van      Italian Landscape   
169   KLODT, Mikhail Konstantinovich  Under the Birch Trees   
1728            BEYEREN, Abraham van        River Landscape   

                      TECHNIQUE     DATE       TYPE   SCHOOL  TIMEFRAME  \
527   Oil on canvas, 48 x 38 cm  c. 1735  landscape  Flemish  1651-1700   
169   Oil on canvas, 27 x 48 cm     1874  landscape  Russian  1851-1900   
1728   Oil on panel, 61 x 94 cm        -  landscape    Dutch  1651-1700   

                                       Encoded Features  Cluster  
527   [0.023636755, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0....        5  
169   [0.7051101, 0.7938929, 0.8542635, 0.8649235, 0...        5  
1728  [0.0056887576, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0...        5  
Image found: ./SemArt/all_image/Images\04775-landsca2.jpg
No description has been provided for this image
Image found: ./SemArt/all_image/Images\21079-birchtre.jpg
No description has been provided for this image
Image found: ./SemArt/all_image/Images\04486-river_la.jpg
No description has been provided for this image
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
================1=================
Image Input to gregoire-boonzaier_district-six-1.jpg
No description has been provided for this image
Top 3 similar images to the new point are: 1011    00813-marco_p3.jpg
1630      44746-silver.jpg
264      21928-6carnat.jpg
Name: IMAGE_FILE, dtype: object
              IMAGE_FILE                                        DESCRIPTION  \
1011  00813-marco_p3.jpg  This picture is the second from the left on th...   
1630    44746-silver.jpg  This small painting from Villa Medici in Rome ...   
264    21928-6carnat.jpg  A work that would seem to evoke the sketches o...   

                 AUTHOR                                   TITLE  \
1011      ANGELICO, Fra  Saint Cosmas and Saint Damian Salvaged   
1630     ZUCCHI, Jacopo                           Age of Silver   
264   LEONARDO da Vinci            The Madonna of the Carnation   

                                  TECHNIQUE     DATE          TYPE   SCHOOL  \
1011  Tempera and gold on panel, 38 x 45 cm  1438-40     religious  Italian   
1630                Oil on wood, 50 x 39 cm  c. 1587  mythological  Italian   
264              Oil on panel, 62 x 47,5 cm  1478-80     religious  Italian   

      TIMEFRAME                                   Encoded Features  Cluster  
1011  1401-1450  [0.65917206, 0.5971162, 0.5720089, 0.59600925,...       19  
1630  1551-1600  [0.5111796, 0.22443137, 0.0, 0.0, 0.0, 0.0, 0....       19  
264   1451-1500  [0.6672655, 0.5776128, 0.57957566, 0.5855584, ...       19  
Image found: ./SemArt/all_image/Images\00813-marco_p3.jpg
No description has been provided for this image
Image found: ./SemArt/all_image/Images\44746-silver.jpg
No description has been provided for this image
Image found: ./SemArt/all_image/Images\21928-6carnat.jpg
No description has been provided for this image
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
================2=================
Image Input to ivan-shishkin_forest-stream-2.jpg
No description has been provided for this image
Top 3 similar images to the new point are: 1699    21402-woodedla.jpg
781       28588-paris5.jpg
1227    25344-adoratio.jpg
Name: IMAGE_FILE, dtype: object
              IMAGE_FILE                                        DESCRIPTION  \
1699  21402-woodedla.jpg  The picture shows a wooded landscape with figu...   
781     28588-paris5.jpg  Flags were flown on 30 June, 1878, to mark the...   
1227  25344-adoratio.jpg  In this work by Mazzolino the sharpness of lin...   

                   AUTHOR                                              TITLE  \
1699      LAMBERT, George                                   Wooded Landscape   
781         MONET, Claude  Rue Montorgueil in Paris, Celebration of 30 Ju...   
1227  MAZZOLINO, Ludovico                         Adoration of the Shepherds   

                        TECHNIQUE     DATE        TYPE   SCHOOL  TIMEFRAME  \
1699    Oil on canvas, 52 x 65 cm     1725   landscape  English  1701-1750   
781     Oil on canvas, 81 x 51 cm     1878  historical   French  1851-1900   
1227  Oil on wood, 79,5 x 60,5 cm  1520-24   religious  Italian  1501-1550   

                                       Encoded Features  Cluster  
1699  [0.656446, 0.61934847, 0.6199076, 0.6001728, 0...       10  
781   [0.51246387, 0.23570238, 0.38361302, 0.5123436...       10  
1227  [0.5227404, 0.34734032, 0.38964167, 0.4373181,...       10  
Image found: ./SemArt/all_image/Images\21402-woodedla.jpg
No description has been provided for this image
Image found: ./SemArt/all_image/Images\28588-paris5.jpg
No description has been provided for this image
Image found: ./SemArt/all_image/Images\25344-adoratio.jpg
No description has been provided for this image

The model I can say is mixed performance first K means clustering seems to be working great. If you see the top 3 images seem to be very similar so it has captured meaningful things. The autoencoder that embedded the new test data was a bit underperforming but this is understandable since the variation of training data was hugely different and autoencoders do have trouble when they do tasks on unseen data. If the given dataset was large it would be better since, especially for old art pieces overfitting is also more susceptible since there will be no new image created.

4. Generalization¶

We were originally planning on using art from the AGO, but due to the strike we changed to the ROM. One unfortunate issue with the ROM is that most of the art is unlabelled and we were not able to find more information using their online collection database.

Without data augmentation or training on photos (only trained on art scans), the model performed decently.

The following images are of Ancient Greek and Korean (Joseon) artworks, which the model was not trained on. We can see that for Korean art, it also suggests the style is Ukiko-e which is a good prediction since it shares some similarity due to their shared East Asian origin. For the Ancient Greek art, the model suggests minimalism and abstract art. Although this is not the correct classification, it is probably the closest classes that could be assigned based on the model's available classes. Looking at the original Ancient Greek artwork, there are aspects of minimalist and abstract art with the abstract and monotone figures.

korean-1.png greek-1.png

5. Application¶

The app requires the user to crop the image, similar to Google Lens. After pressing the generate button, the app will output the top 3 predicted labels for style and genre, and a recommended artwork and its title and description.

The app hasn't been posted to Hugging Face Spaces yet. We're waiting until the next update to gradio where they fix the crop feature and the ability to choose which camera to use on a mobile device. For the moment, the app can be used locally. We will update the github page when we release the app. To access the app right now, you can just run app-demo.py.

app.png

A key feature of the app is the feedback mechanism which is used for the evaluation of the recommended artwork. Since the quality of the recommended artwork is based on a subjective evaluation, we will use the user feedback to evaluate and further improve our model. The user can rate the recommended artwork out of 10 and correct any mistakes from the classification. The app will also save the current input and outputs and store them for improving our model at a later date.

feedback.png

6. Model Comparison¶

Here are some related works with their respective performances.

Source Project #1 Project #2 Published research
Accuracy 57% [1] 57.80% [2] 71.24% [3]

Future Work¶

  • dockerize
  • deploy app on huggingface/spaces
  • fine-tune resnet
  • data augmentation
  • image segmentation